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The  ideas  in  this  paper  are  based  on  an  analysis  of  statistical 
explanation  that  uses  the  information  transmitted  by  a  theory  [2]. 

Consider  a  theory  that  specifies  a  probability  distribution  on  . , 
events  of  some  domain,  where  for  purposes  of  analysis  we  divide  the 
variables  that  describe  the  domain  into  two  sets:  [M],  a  set  of  variables 
whose  values  are  to  be  explained,  and  [S],  a  set  of  variables  whose  values 
are  used  to  explain  the  values  of  the  variables  in  [M] .  The  information 
transmitted  by  the  theory  is 

I(S,M)  «  H(S)  ♦  H(M)  -  H(SxM) ,  (1) 

where  H(S)  and  H(M)  are  the  uncertainties  of  the  events  in  [S]  and 
[M],  respectively,  and  H(SxM)  is  the  uncertainty  of  the  joint  events. 

To  illustrate  this  idea,  consider  the  problem  of  explaining  medical 
symptoms  such  as  fever,  coughing,  skin  rash  and  abdominal  pain.  The  kinds 
of  variables  available  for  explaining  symptoms  are  facts  in  a  patient's 
medical  history  and  information  about  the  patient's  recent  contact  with 
other  people  with  similar  symptoms.  For  the  moment,  I  will  avoid  reference 
to  disease  entities,  and  deal  only  with  the  data  that  are  available  to  the 
physician.  I  have  to  strain  your  imagination  a  bit  to  do  this,  but  a  theory 
about  skin  rash  and  fever  might  run  something  like  this:  Let  Mj  denote 
a  combination  of  fever  and  a  blotchy  skin  rash,  and  let  denote  the 
absence  of  this  pattern  of  symptoms.  Let  [S]  be  a  set  of  three  variables: 
(1)  the  patient's  age,  (2)  whether  he  has  been  in  contact  with  another 
person  showing  the  symptoms  within  the  last  month,  and  (3)  whether  he  had 
the  symptoms  himself  at  any  previous  time.  If  each  of  these  variables  had 
three  values,  then  [S]  would  partition  the  domain  of  people  into  3^  *  27 
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sets:  for  example,  one  set  could  include  all  infants  under  the  age  of 
two  months  who  had  been  in  contact  with  someone  with  the  symptoms  recently 
and  had  not  shown  the  symptoms  themselves.  The  theory  then  would  consist 
of  a  set  of  probabilities  of  the  27  events  distinguished  in  the  explanans, 
and  27  conditional  probabilities --a  value  of  P (M 1 J S ^ )  for  each  S^e[S], 
These  probabilities  are  sufficient  to  specify  the  probabilities  of  all 
the  joint  events,  and  therefore  the  overall  probability  of  is  speci¬ 
fied  also.  Facts  like  "infants  less  than  two  months  old  seldom  have  the 
symptoms"  and  "contact  with  a  person  who  has  the  symptoms  increases  the 
probability  of  having  them,  unless  the  person  has  had  the  symptoms  pre¬ 
viously  himself"  would  be  incorporat  d  in  the  conditional  probabilities, 
and  facts  like  the  proportion  of  people  who  have  had  the  symptoms  and 
the  proportion  of  people  who  are  younger  than  two  months  of  age  would  be 
incorporated  in  the  probabilities  of  the  explanans. 

Let 

ai  '  P(Si}*  °  P(Mk}*  Pik  ‘  P<MklSi)-  (2) 


In  relation  to  the  example  about  symptoms,  the  a^  are  the  proportions 
of  people  in  the  various  categories  specified  by  age,  medical  history,  and 
recent  contacts.  The  c^  are  the  proportions  of  people  who  have  or  do 
not  have  the  symptoms  now.  The  p^  are  the  conditional  probabilities 
of  having  the  symptoms,  given  the  various  categories  of  age,  medical 
history,  and  recent  contacts.  The  quantities  in  equation  (1)  are  defined 
as 


H(S)  =  E  -  a.  log  ait  H(M)  *  E  -  log  Cy, 

(3) 

H (S  *  M)  =  EE-  a.pik  log  a.p.k. 
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If  we  think  about  a  theory  in  relation  to  its  information  trans¬ 
mitted,  we  are  immediately  led  to  considering  the  overall  properties 
of  the  theory.  Infomation  transmitted  is  a  measure  of  the  reduction 
in  uncertainty  brought  about  by  the  dependencies  between  the  variables. 
Thus,  it  is  an  index  of  the  explanatory  power  of  the  theory  in  relation 
to  the  entire  domain  of  events  that  the  theory  deals  with.  In  fact, 
the  point  of  introducing  the  analysis  based  on  information  transmitted 
was  to  provide  some  concepts  that  would  make  it  reasonable  to  consider 
the  evaluation  of  theories,  rather  than  of  single  explanations.  In 
my  opinion,  it  is  more  appropriate  and  useful  to  consider  general  pro¬ 
perties  of  theories  than  it  is  to  deal  with  the  status  of  single  explana¬ 
tions.  And  the  information  transmitted  seems  to  capture  some  of  the 
properties  that  are  desireable  for  a  measure  that  is  used  to  evaluate 
a  theory.  For  example,  a  higher  value  of  information  transmitted 
generally  goes  with  a  greater  degree  of  testability  in  Popper's  sense, 
anda  greater  degree  of  predictive  usefulness . 

In  this  paper,  I  want  to  extend  this  line  of  analysis  to  the  con¬ 
sideration  of  theories  that  provide  statistical  explanations  and  that 
postulate  theoretical  entities.  (The  earlier  analysis  was  limited  to 
relationships  between  empirical  variables.)  To  carry  out  this  analysis, 

I  need  to  introduce  a  structure  that  is  slightly  more  complex  than  the 
one  described  above. 

Consider  a  domain  n,  partitioned  by  three  sets  of  variables:  [S], 
a  set  of  empirical  variables  used  as  explanans;  [M],  a  set  of  empirical 
variables  whose  values  are  to  be  explained;  and  [T],  a  set  of  postulated 
theoretical  variables.  The  values  of  the  theoretical  variables  are 
assumed  to  be  produced  by  the  values  of  the  explanans,  and  in  turn  to 
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produce  the  values  of  the  explananda,  according  to  statistical  laws 
specified  by  the  theory.  The  sense  in  which  I  use  the  phrase  "theoreti¬ 
cal  variables  produce  the  values  of  the  explananda"  is  entirely  neutral 
as  regards  metaphysics.  I  simply  mean  that  the  conditional  probabilities 
of  the  empirical  outcomes--the  explananda--given  any  theoretical  state 
are  the  same,  regardless  of  the  value  of  the  explanans  that  applies.  In 
other  words,  I  assume  that  the  probability  law  connecting  any  given 
theoretical  state  with  the  explananda  is  independent  of  the  conditions 
that  produced  that  theoretical  state.  Under  this  assumption,  the  theory 
consists  of  three  sets  of  probabilities:  (1)  a  vector  of  probabilities 
ai,  where 

-  P(Si),  (4) 

(2)  a  set  of  conditional  probabilities  q^,  linking  the  explanans  to 
the  theoretical  states,  where 


P(T.|S.), 


(5) 


and  (3)  a  set  of  conditional  probabilities  r^,  linking  the  theoretical 
states  to  the  explananda,  where 


rjk  ■  p<MklV-  (6> 

First,  it  may  be  remarked  that  these  quantities  relate  to  those  des¬ 
cribed  at  the  beginning  in  a  straightforward  way.  When  theoretical  variables 
are  not  taken  into  account,  a  theory  is  specified  by  a  vector  of  probabili¬ 
ties  of  the  explanans,  the  a^,  and  a  matrix  of  conditional  probabilities 
linking  the  explanans  and  the  explananda,  the  p^.  This  matrix  is  just 
the  product  of  the  matrices  of  conditional  probabilities  q^  and  r^.  In 


s 


other  words,  the  probabilities  linking  the  explanans  and  the  explananda 
are  specified  by  the  probabilities  defined  in  equations  (T»)  and  ((>). 

pu  ■  (7) 

Of  course,  a  theory  is  testable  because  data  can  be  used  to  check  its 
assumptions.  In  a  theory  of  the  kind  described  here,  the  data  are  the 
empirical  values  of  the  a^  and  the  p^»  and  these  can  be  used  to  test 
the  theory.  If  the  theory  specifies  numerical  probabilities  (rather  than 
merely  the  existence  of  specified  states)  then  the  process  of  testing 
the  theory  is  just  that  of  comparing  the  theoretical  values  of  the  a^ 
and  the  values  of  p^  calculated  from  equation  (7)  with  empirical  values 
of  a^  and  p.^  that  can  be  obtained  by  whatever  means  are  available. 

When  the  theory  just  specifies  that  certain  states  exist,  testability 
involves  the  estimation  of  parameters,  and  I  will  leave  that  discussion 
for  a  later  time  in  this  paper. 

The  main  question  that  I  want  to  deal  with  is  the  relationship  be¬ 
tween  theoretical  variables  and  the  information-theoretical  properties 
of  a  theory.  The  situation  involves  three  matrices:  one  linking  the 
empirical  explanans  and  the  theoretical  variables,  a  second  involving 
the  theoretical  variables  and  the  empirical  explananda,  and  a  third,  the 
product  of  the  first  two,  linking  the  empirical  explanans  and  the  empirical 
explaranda.  We  can  calculate  the  information  transmitted  by  each  of  these. 
As  an  hypothetical  example,  consider  a  theory  that  specifies  four  possible 
values  of  S  and  three  possible  values  of  M.  This  might,  for  example, 
involve  classification  of  medical  information  into  four  categories  of 
medical  history  and  three  categories  of  symptom  patterns.  In  addition, 
the  theory  postulates  the  existence  of  a  theoretical  state,  such  as  a 
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disease  entity  that  nay  be  present  or  absent,  giving  two  values  of  T. 
To  have  a  concrete  illustration,  suppose  the  values  of  the  aJ#  q^, 
and  r^  are 

[ai]  -  (.20,  .30,  .10,  .40], 


The  values  of  information  transmitted  by  these  matrices,  using  natural 
logarithms,  are 

I(S,T)  ■  0.10,  I(T,M)  ■  0.16. 


The  implications  of  this  theory  for  dependencies  between  explanans 
and  explananda  are  described  by  the  values  of  p^,  which  are 


(9) 


Using  the  values  of  a^  given  earlier,  the  information  transmitted  is 


I  (S.M)  =  0.03. 


An  interesting  general  fact  is  illustrated  by  the  example.  The  value 
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of  !(S,M)  is  never  larger  than  either  1(S,T)  or  I(T,M).  This  claim 
can  be  proved  as  follows.  Consider  a  state-space  [Q]  ■  [S]  »  (T).  The 
conditional  probabilities  PO^lQj,)  will  eqtml  the  conditional  pro¬ 
babilities  r^,  hence,  the  information  transmitted  I CQ,M)  will  he 
the  same  as  I(T,M).  (This  is  established  by  an  argument  given  in  |2] 
in  connection  with  equation  (17)  of  that  paper.)  The  state-space  [S] 
is  a  partition  of  [Q],  and  it  can  be  shown  that  the  information  trans¬ 
mitted  by  collapsing  two  or  more  of  the  states  of  a  matrix  cannot  be 
larger  than  the  information  transmitted  by  the  original  matrix. 

Nhat  this  fact  means  is  that  the  dependencies  between  the  theoretical 
variables  and  the  empirical  variables  are  stronger  in  the  sense  of  approach¬ 
ing  probabilities  of  one  and  zero  than  are  the  dependencies  between  the 
two  sets  of  empirical  variables.  I  am  inclined  to  believe  that  this  fact 
is  related  to  the  intuitions  that  we  have  regarding  the  desireability  of 
theoretical  explanations  when  the  dependencies  between  empirical  variables 
are  statistical.  If  the  theory  correctly  specifies  a  set  of  states  that 
produce  the  phenomena  to  be  explained  in  the  sense  assumed  here,  then 
the  explanation  in  terms  of  the  theoretical  states  is  better,  in  the  sense 
of  information  transmitted,  than  is  the  explanation  in  terms  of  the  empiri¬ 
cal  explanans. 

On  the  other  hand,  the  improved  quality  of  the  explanation  obtained 
by  introducing  theoretical  states  may  be  merely  a  "paper"  profit.  Without 
further  theoretical  development,  merely  introducing  theoretical  variables 
may  not  change  the  empirical  content  of  the  theory.  However,  further 
developments  are  often  guided  by  the  postulated  properties  of  theoretical 
entities. 

One  kind  of  development  involves  the  discovery  of  new  empirical  variables 
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that  can  be  added  to  the  explanans.  Refer  back  to  equation  (8),  and 
imagine  that  at  some  stage  of  scientific  investigation  the  probabilities 
specified  there  represent  the  best  available  theory  about  the  explananda 
[M] .  This  is  equivalent  to  saying  that  the  best  judgment  that  can  be 
made  on  the  available  evidence  is  that  there  are  states  and  T2 
that  cannot  be  distinguished  directly  in  observations  (at  least  with  pre¬ 
sent  technology)  that  are  related  to  the  explananda  according  to  pro¬ 
babilities  given  by  the  values  of  the  r^.  One  research  problem  that 
would  be  potentially  worthwhile  in  such  a  situation  would  be  the  search 
for  improved  knowledge  about  the  conditions  that  produce  the  theoretical 
states.  If  additional  variables  that  are  related  to  the  theoretical  states 
could  be  discovered,  then  the  likely  outcome  would  be  an  increase  in 
I(S,T),  and  a  corresponding  increase  in  I(S,M).  Die  limit  of  this 
process  of  obtaining  better  knowledge  about  conditions  producing  the 
theoretical  states  is  a  theory  with  information  transmitted  equal  to  I(T,M). 

A  second  kind  of  research  that  could  be  motivated  by  a  state  of 
knowledge  such  as  that  described  by  equation  (8)  involves  an  effort 
to  develop  new  theoretical  variables  in  order  to  increase  I(T,M).  If 
the  conditional  probabilities  of  the  explananda,  given  the  theoretical 
states  are  substantially  different  from  one  and  zero,  there  is  a  strong 
presumption  that  the  theoretical  description  is  incomplete  and  that 
additional  distinctions  among  theoretical  states  can  be  found  to  reduce 
the  conditional  uncertainty  of  the  explananda  given  the  theoretical 
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A  third  kind  of  development  from  a  situation  like  that  of  equation 
(8)  could  be  the  discovery  of  new  phenomena  that  can  be  explained  by  the 
empirical  explanans  and  the  theoretical  variables  of  the  theory.  The 
extension  of  a  theory  to  additional  explananda  usually  has  the  effect  of 
increasing  the  information  transmitted — in  this  case,  involving  the  rela¬ 
tionship  between  theoretical  states  and  the  explananda. 

Next,  I  will  discuss  theories  of  the  form  of  equation  (8),  but  with 
free  parameters  rather  than  numerical  values  of  the  conditional  probabili¬ 
ties  q.  .  and  r..  .  Theoretical  proposals  that  specify  numerical  probabili- 

JK 

ties  are  very  rare.  However,  a  kind  of  situation  that  occurs  frequently 
is  one  in  which  dependencies  among  empirical  variables  are  known,  and  a 
theorist  proposes  to  explain  the  dependencies  in  religion  to  a  set  of 
theoretical  states.  In  its  weakest  form,  this  kind  of  theoretical  pro¬ 
posal  simply  specifies  a  number  of  theoretical  states  and  all  the  condi¬ 
tional  probabilities  are  free  parameters.  Let  n  be  the  number  of  values 
taken  by  the  explanans  and  let  m  be  the  nunber  of  values  taken  by  the 
explananda.  Then  if  t  is  the  number  of  postulated  states  in  the  theory, 
the  form  of  the  hypothesis  is 


3{qi^:i»l,.  ..,n;  j«l,...,t}  3{r^:  j«l, . .  .,t;k«l, .. . ,m> 


<Vi)IVk)[Plk  .  JJ1««iJrikl . 


(10) 


That  is,  the  theory  asserts  that  it  is  possible  to  find  a  set  of  pro¬ 
babilities  (the  q^j  and  r^)  such  that  all  the  values  of  pik  can 
be  calculated  using  equation  (7) . 


An  hypothesis  of  this  kind  can  be  tested  if  the  number  of  free  para¬ 
meters  is  smaller  than  the  nunber  of  quantities  that  can  be  obtained  from 
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the  empirical  dependencies.  In  the  weak  form  of  the  theory  described 
above,  the  nunber  of  free  parameters  is  (t-l)n+(m-l)t .  The  number  of 
empirical  quantities  is  (m-l)n.  Then  the  theory  is  testable  if  the 
following  inequality  is  satisfied: 


t  < 


mn 

m+n-1 


(11) 


It  should  be  noted  that  equation  (11)  applies  only  when  a  theory  is 
stated  without  constraints  on  the  theoretical  parameters.  Most  frequently, 
substantive  hypotheses  about  the  postulated  states  impose  constraints 
that  reduce  the  number  of  free  parameters  of  the  theory.  The  most  general 
form  of  the  kind  of  theoretical  proposal  we  are  discussing  specifies 
a  set  of  free  parameters  (e^,...,es)  such  that  each  of  the  conditional 
probabilities  of  the  theory  is  a  specified  function  of  the  parameters. 

Then  the  condition  for  testability  is  that  s  must  be  less  than  the 
nunber  of  empirical  quantities,  and  the  hypothesi'  has  the  form 


fce1)...Ges)(Vi)(Vk)[pik 


(12) 


The  procedure  for  testing  the  hypothesis  involves  finding  estimates  of 
the  theoretical  parameters  that  bring  the  values  of  p^k  implied  by  the 
theory  as  close  as  possible  to  the  empirical  values.  Then  the  degree 
of  approximation  between  the  theoretical  and  empirical  values  of  p^k 
can  be  evaluated  using  standard  statistical  techniques. 

There  is  enough  structure  in  these  general  concepts  now  so  that  a 
realistic  example  can  be  introduced.  I  will  describe  a  theory  about  the 
detection  of  weak  signals  proposed  by  Luce  [4].  The  theory  is  used  in 
analysis  of  experiments  where  on  some  trials  a  relatively  weak  stimulus 
signal  wUW  some  fixed  energy  level  is  presented  along  with  a  noisy  input 
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that  makes  it  difficult  to  tell  whether  the  signal  is  there  or  not. 

The  noise  has  a  fixed  mean  energy  level,  and  on  trials  when  the  signal 
is  not  presented  the  noise  is  presented  alone.  There  are  several 
conditions  designed  to  produce  different  response  biases.  These  may 
be  produced  by  varying  the  payoffs  for  different  kinds  of  correct 
responses  (identifying  signals  when  they  are  present  or  correctly 
saying  that  the  signal  is  absent)  or  by  imposing  varying  penalties  for 
different  kinds  of  errors  (missing  signals  or  saying  there  is  a  signal 
when  only  noise  is  presented).  Another  way  of  producing  response  bias  is 
to  vary  the  overall  proportion  of  trials  when  a  signal  is  presented,  thus 
producing  higher  or  lower  expectations  of  the  signal.  The  empirical 
explananda  are  the  subjects'  responses:  on  each  trial  a  subject  says 
either  "yes"  or  "no,"  depending  on  whether  he  judges  a  signal  to  have 
been  present  or  absent.  The  explanans  are  the  experimental  conditions: 
on  each  trial,  there  is  some  condition  of  payoff  and  a  priori  expectation 
of  a  signal,  and  either  the  signal  is  presented  or  only  noise  is  presented. 
At  the  level  of  data,  an  experiment  can  be  described  as  a  set  of  condi - 
tional  frequencies 

Pik  *  P (yes | condition  i) 

where  the  conditions  are  given  in  the  following  equation: 

yes _ no 

Pl2 
P22 
P32 

P42 
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P(2g-1)2 

P(2g)2 


(13) 
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where  g  is  the  number  of  different  motivational  conditions. 

Luce's  theory  uses  the  assumption  of  a  threshold  of  detection  and 
the  probability  of  exceeding  the  threshold  depends  only  on  whether  the 
signal  was  presented  or  not.  The  other  theoretical  variable  is  deter¬ 
mined  by  the  motivational  conditions.  Thus,  on  each  trial,  the  subject 
is  assumed  to  be  in  one  of  2g  theoretical  states,  as  specified  in 
equation  (14). 
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denotes  a  state  in  which  the  threshold  of  detection  is  not  exceeded 
in  motivational  condition  h,  and  denotes  a  state  in  which  the 
threshold  is  exceeded  in  motivational  condition  h.  The  probability  of 
exceeding  the  threshold  when  the  signal  is  presented  is  p,  and  the  pro¬ 
bability  of  exceeding  the  threshold  when  noise  is  presented  alone  is  q. 

It  is  assumed  that  the  motivational  states  affect  the  relationships 
between  the  theoretical  states  and  the  responses.  The  states  are  assumed 
to  be  ordered  in  their  tendency  to  produce  biases  favoring  "yes"  responses; 
let  State  1  denote  the  condition  producing  the  greatest  reluctance  to  say 
"yes."  It  is  assumed  that  there  is  some  motivational  state  f  such  that 
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F(yes|oh) 


F(yes I 


v{: 


for  h  <  f 


for  h  >  f 


for  h  <_  f 
for  h  >  f 


with  and  t^  ordered  monotonically  with  the  values  of  h.  In  other 
words,  for  states  in  which  the  subject  is  reluctant  to  say  "yes,"  he 
always  says  "no"  if  the  threshold  is  not  exceeded  and  divides  his  res¬ 
ponses  between  "yes"  and  "no"  randomly  when  the  threshold  is  exceeded. 

And  for  states  in  which  the  subject  is  reluctant  to  say  "no"  he  always 
says  "yes"  if  the  threshold  is  exceeded  and  divides  his  responses  between 
"yes"  and  "no"  when  the  threshold  is  not  exceeded,  the  matrix  of  condi¬ 
tional  probabilities  connecting  the  theoretical  states  and  the  responses 
is  given  below. 


Recall  that  the  theoretical  values  of  the  p^  that  can  be  compared 
with  data  are  obtained  by  multiplying  the  matrices  ^ ]  and  [r^].  In 
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the  case  of  Luce's  theory,  this  yields 


yes  no 


Luce's  theory  illustrates  the  kind  of  situation  described  in 
connection  with  equation  (12).  A  number  of  parameters  are  specified-- 
there  are  g+2  of  them--and  the  theory  asserts  that  the  parameters 
detentiine  the  relationships  between  explanans  (in  this  case,  experi¬ 
mental  conditions)  and  theoretical  states  and  explananda  (in  this  case, 
judgments  about  whether  a  signal  was  present).  The  parameters  therefore 
specify  a  theoretical  relationship  between  the  explanans  and  the  explananda. 
In  order  to  obtain  a  numerical  relationship,  the  parameters  must  be  esti¬ 
mated  from  data,  and  the  theory  is  testable  if  the  number  of  free  para¬ 
meters  is  less  than  the  nunber  of  empirical  quantities  in  the  data.  In 
the  present  example,  the  nunber  of  empirical  quantities  is  2g,  so  the 
theory  is  testable  whenever  g  is  greater  than  two. 

Numerical  values  of  the  parameters  must  be  estimated  both  to  test 
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the  theory  and  to  specify  the  information-theoretical  properties  of  the 
theory.  Putting  this  in  another  way,  the  information  transmitted  by 
the  theory  is  a  function  of  the  parameter  values.  To  provide  a  further 
illustration  of  the  application  of  information  theory  to  the  analysis  of 
statistical  explanations,  I  have  calculated  the  information  transmitted 
by  one  special  case  of  the  theory.  Suppose  that  g  »  5  (that  is,  five 
different  motivational  conditions  are  used)  and  the  five  conditions  are 
used  equally  often.  Furthermore,  suppose  that  signals  are  presented  on 
one-half  of  the  trials  under  each  motivational  condition.  This  means 
that  each  of  the  states  constituting  the  explanans  has  probability  .10, 
and  from  this  we  can  calculate 

H(S)  »  2.30. 

Now,  suppose  that  an  experiment  is  conducted  and  the  estimated  values 
of  p  and  q  are  .60  and  .30,  respectively.  Recall  that  these  are  the 
probabilities  of  exceeding  the  threshold  on  signal  trials  and  noise 
trials,  respectively.  This  permits  us  to  calculate  the  probabilities 
of  the  theoretical  states;  the  probability  of  each  Dh  is  .11  and  the 
probability  of  each  is  .09.  We  can  then  calculate 

H(T)  «  2.30,  H(S*T)  «  2.94.  * 

The  remaining  parameters  are  the  conditional  probabilities  of  saying 
"yes"  in  the  various  theoretical  states.  Suppose  it  is  estimated  that 
f  *  3,  and  the  values  of  the  u^  and  t^  parameters  are 

Uj  ■  .40,  u2  «  .70,  u3  «  1.00,  t4  »  .50,  t&  «  .80. 


It  turns  out  that  this  implies  that  subjects  will  say  "yes"  with  probability 


16 


.51,  and  we  can  calculate 

H(M)  =  .69,  H(TVM)  =  2.SS. 

Combining  these  values  using  equations  (1)  and  (7),  we  arrive  at 
values  of  information  transmitted  as  follows: 

I (S,T)  =  1.66,  I (T,M)  -  0.44,  I(S,M)  «=  0.17. 

Note  again  that  the  information  transmitted  by  the  relationships  between 
explanans  and  explananda  is  smaller  than  either  of  the  quantities  involving 
the  theoretical  states. 

Developments  motivated  by  Luce's  theory  can  be  used  to  illustrate 
the  remarks  made  earlier  about  the  kinds  of  research  that  provide  im¬ 
provements  in  theories  of  this  kind.  One  kind  of  development  involves 
applying  the  theory  to  more  complex  experiments.  Luce's  theory  has  been 
applied  to  studies  in  which  two  different  signals  have  been  presented 
on  different  trials.  The  experiments  involved  auditory  detection,  and 
the  signals  were  tones  of  different  frequency.  This  change  increases 
the  number  of  states  in  the  set  of  explanans,  and  thus  increases  the 
information  transmitted  by  the  theory.  In  addition,  the  subjects  were 
asked  to  identify  the  signal  after  they  judged  whether  a  signal  was  pre¬ 
sented.  Thus,  instead  of  just  saying  "yes"  or  "no,"  subjects  also  clas¬ 
sified  the  stimuli  as  "high"  or  "low"  as  well.  With  a  more  detailed 
set  of  explananda,  additional  explanatory  power  was  obtained.  Of  course, 
this  experiment  was  not  just  cooked  up  to  provide  more  detail  for  its 
own  sake;  the  threshold  theory  makes  the  strong  prediction  that  when  the 
threshold  is  not  exceeded  (as  is  the  case  for  all  "no"  responses  when 
h  >  f)  the  subject  should  have  no  information  about  which  stimulus  was 
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presented,  and  the  experiment  was  designed  in  part  to  test  this  prediction. 

The  other  kind  of  development  involves  adding  to  the  complexity  of  the 
theoretical  description,  guided  by  data  that  are  not  consistent  with  the 
simpler  theory.  Krantz  [3]  has  proposed  such  a  theory,  in  which  an  addi¬ 
tional  state  D*^  is  postulated  for  each  motivational  state.  D*  is  a 
state  of  strong  detection,  in  which  the  subject  is  sure  that  a  signal  was 
presented.  In  Krantz’  theory,  then,  there  are  two  postulated  thresholds.  If 
the  lower  threshold  is  not  exceeded  the  subject  is  in  State  D^,  and  if  the 
higher  threshold  is  exceeded  the  subject  is  in  State  D*^ .  It  is  assumed 
that  the  subject  always  judges  that  a  signal  was  presented  when  he  is  in 
State  D*^,  regardless  of  the  motivational  state.  The  main  reason  for  compli¬ 
cating  a  theory,  of  course,  is  to  correct  apparent  defects  that  are  revealed 
by  failure  of  data  to  agree  with  predictions  derived  from  the  theory.  But 
this  also  has  the  effect  of  increasing  the  explanatory  power  of  the  theory 
in  the  sense  of  information  transmitted  reflected  by  values  of  I(S,T)  and 
I(T,M). 

This  example  from  the  theory  of  perception  illustrates  several  aspects 
of  the  role  of  theoretical  entities  in  statistical  explanation.  Other 
examples  could  be  used  for  the  same  purpose,  from  other  areas  of  psychological 
theory  as  well  as  from  other  fields.  It  also  may  be  remarked  that  the 
analysis  of  discrete  states  does  not  affect  the  main  features  of  the  analysis. 
Systems  that  involve  continuous  variables  can  be  analyzed  from  this  point 
of  view  and  their  general  properties  are  analogous  to  those  of  discrete-state 
systems,  although  the  analysis  of  systems  with  discrete  states  is  easier  to 
understand. 

An  important  feature  of  the  kind  of  theory  that  I  have  been  discussing 
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is  that  it  describes  a  system  whose  statistical  properties  do  not  change 
over  time.  In  such  a  stationary  system,  the  main  role  of  theoretical 
entities  seems  to  be  heuristic,  in  that  they  guide  the  development 
of  new  empirical  and  theoretical  research  and  thus  facilitate  the  exten¬ 
sion  of  knowledge.  In  the  remainder  of  this  paper  I  will  disucss  systems 
that  are  not  stationary,  and  in  these  systems  the  use  of  theoretical 
entities  can  lead  to  a  considerable  simplification  of  the  statistical 
structure  of  a  theory  ir.  addition  to  their  heuristic  value. 

The  simplest  kind  of  nonstationary  system  is  illustrated  by  experi¬ 
ments  in  learning  and  problem  solving.  In  their  simplest  form,  these 
experiments  consist  of  repeated  trials  where  a  subject  is  given  oppor¬ 
tunities  to  study  the  material  to  be  learned  or  is  given  information 
relevant  to  solving  the  problem.  I  will  impose  a  relatively  stringent 
condition  of  uniformity  for  the  purpose  of  analysis  here.  I  will  ignore 
differences  in  procedure  that  occur  on  different  trials.  When  such  dif¬ 
ferences  cannot  be  neglected,  the  situation  would  be  analyzed  as  the  con¬ 
catenation  of  different  experiments. 

At  the  beginning  of  an  experiment,  the  subject  either  gives  only 
incorrect  responses  or  he  responds  correctly  with  some  probability  due 
to  guessing,  depending  on  the  procedure  of  the  experiment.  As  the  subject 
proceeds  through  the  experiment,  the  probability  of  correct  response 
increases.  The  subject  may  reach  a  state  in  which  he  gives  only  correct 
responses,  or  he  may  reach  some  asymptotic  state  in  which  the  probability 
of  correct  response  is  at  some  level  less  than  one.  Thus,  the  most 
salient  general  feature  of  the  experiment  is  an  increase  in  the  probability 
of  correct  response  from  some  initial  value  Cj  (where  Cj  may  be  zero) 
to  some  asymptotic  level  c^  (where  c^  may  be  one). 

The  data  from  a  learning  or  problem  solving  experiment  are  sequences 
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of  responses  given  by  tho  subjects.  In  general ,  the  probability  of 
response  on  each  trial  n  depends  on  the  sequence  of  responses  given 
on  the  preceding  n-1  trials.  One  natural  way  to  consider  the  experi¬ 
ment,  then,  is  as  a  sequence  of  probabilistic  events  in  which  the 
response  on  each  trial  n  is  an  event  to  be  explained,  and  the  sequence 
of  responses  on  trials  1,  2,  ...,  n-1  is  the  event  available  to  be  used 
as  the  explanans. 

To  illustrate  the  situation,  consider  the  first  four  trials  of  an 
experiment.  The  data  are  hypothetical,  and  were  generated  from  a  theory 
that  will  be  presented  later.  In  the  notation,  a  correct  response  is 
denoted  0  and  an  error  is  denoted  1.  On  trial  1,  the  subject  either 
gives  a  correct  response  or  an  error,  and  this  provides  the  first  state- 
space  for  the  explanans.  Ihe  conditional  probabilities  p^  ?  are 
just  the  probabilities  of  correct  response  on  trial  2.  In  the  following 
equation,  the  nuabers  in  parentheses  to  the  left  of  the  first -trial 
states  are  the  probabilities  of  those  states,  and  the  probabilities  of 
response  on  trial  2  are  in  parentheses  at  the  top. 
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The  information-theoretical  quantities  are 
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H2(S)  -  0.56,  H2(M)  ■  0.67,  I2(S,M)  -  0.00. 

Ihe  reason  that  no  infoxmation  is  transmitted  is  that  the  response  on 
trial  2  is  independent  of  the  response  on  trial  1  in  this  situation. 
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The  responses  on  trial  3  are  related  to  the  sequences  of  response 
on  trials  1  and  2. 
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Carrying  out  the  information-theoretical  calculations  we  obtain 


H3(S)  -  1.24,  H3(M)  •  0.69,  I3(S,M)  ■  0.04. 

The  situation  for  trial  4  is  as  follows: 
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H4(S)  ■  1.88,  H4(M)  -  0.66,  I4(S,M)  -  0.17. 
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Clearly,  this  kind  of  calculation  could  be  carried  out  indefinitely. 


It  leads  quickly  to  a  relatively  unmanageable  system;  on  trial  n  the 
state-space  that  constitutes  the  explanans  has  2n~*  members.  On  the 
other  hand,  the  system  eventually  becomes  uninformative.  Assuming  that 


21 


the  system  eventually  reaches  a  stable  asymptotic  level  of  response 
probability,  the  responses  will  eventually  become  independent  of  pre¬ 
ceding  sequences.  In  other  words, 

lim  I  (S,M)  >  0. 
n-x» 

This  fact  makes  it  reasonable  to  think  about  a  theory  of  learning 
in  relation  to  the  sub  of  the  values  of  across  trials.  Using  this 
line  of  thinking,  our  state  of  knowledge  about  a  learning  system  would 
be  evaluated  in  regard  to  the  extent  to  which  the  performance  of  a  learn¬ 
ing  subject  could  be  predicted  on  the  basis  of  his  earlier  performance, 
and  this  seems  like  a  reasonable  way  to  proceed.  For  example,  it  fits  with 
our  intuitions  about  discoveries  that  in  fact  count  as  additions  to  our 
knowledge.  When  responses  are  measured  in  more  detail,  as  by  a  finer 
classification  of  errors  or  by  measuring  additional  properties  such  as 
time  to  respond,  the  effect  is  to  increase  the  values  of  In,  for  the 
same  reason  that  additional  variables  added  to  the  explanans  and  the 
explananda  always  increase  information  transmitted.  In  addition,  when  we 
analyze  properties  of  the  sequence  of  study  trials  or  the  information  that 
is  presented,  thus  going  beyond  the  assumption  of  uniform  trials  that  I 
have  imposed  on  this  analysis,  we  add  new  variables  to  the  explanans 

and  thus  also  increase  the  values  of  I  . 

n 

These  remarks  about  learning  systems  provide  a  framework  for  the 
methodological  analysis  of  nonstationary  systems  that  is  consistent  with 
the  information-theoretical  analysis  worked  out  earlier  for  stationary 
systems.  My  remaining  discussion  will  consider  the  role  of  theoretical 
entities  in  such  systems. 

The  general  structure  that  I  will  use  for  my  remaining  remarks  involves 
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a  state-space  of  postulated  variables.  On  each  trial  there  is  a  set  of 

postulated  states  [T^]  related  to  the  explanans  [S^  and  the 

explananda  [M^]  by  specified  probability  laws  given  as  matrices  of 

conditional  probabilities  fq.  .  1  and  fr..  1.  Thus  on  each  trial 

jK,nJ 

the  same  kind  of  structure  that  we  used  earlier  for  stationary  systems  can 
be  applied  to  analyze  the  information-theoretical  properties  of  a  non¬ 
stationary  system. 

In  principle,  the  theoreti:al  states  may  be  as  complex  as  the 

theorist  wishes.  However,  a  nisnber  of  simplifying  restrictions 

are  frequently  used.  The  first  is  that  the  theoretical  state-space 

[Tn]  is  a  constant  that  I  will  denote  [T] .  Secondly,  the  conditional 

probabilities  of  responses  given  theoretical  states  are  constant,  so  that 

[r..  ]  is  a  constant  [r..].  Finally,  and  perhaps  of  greatest  signifi- 

jx,n  jk 

cance,  the  sequence  of  theoretical  states  that  occurs  over  trials  is 
assumed  to  be  governed  by  a  probability  law  that  is  specified  by  the 
theory. 

Before  considering  the  nature  of  the  probabilities  connecting  the 
theoretical  states  from  trial  to  trial,  it  may  be  noted  that  the  existence 
^  any  probability  law  governing  the  transitions  between  theoretical 
states,  along  with  probabilities  relating  theoretical  states  and  observable 
responses,  are  sufficient  to  specify  empirical  probabilities  of  the  kind 
presented  in  the  illustration  given  above.  The  sequence  of  states  i*  a 
stochastic  process  with  trial  outcomes  T^T^t*  •  •  •  Th®  P*0- 

bability  of  any  sequence  of  outcomes  can  be  calculated  u^ng  the  transition 
probabilities  of  the  system.  Given  a  sequence  of  theoretical  states,  the 
probability  of  any  response  sequence  can  be  calculated  using  the  probabilities 
relating  the  theoretical  states  and  the  responses.  The  conditional 
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probabilities  have  the  form 


,n*Si»n-P 


P(Mk,nlMk1,l'Mk2,2,‘* 


,M.  n  i) 
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*Mk  .,n-l’Mk  ,n} 
_ n- 1 _ n 

1  kn-l'n_1 


Since  the  probabilities  of  all  the  sequences  can  be  calculated,  so  can 
the  conditional  probabilities . 

The  nature  of  the  transition  probabilities  of  the  system  has  a  funda¬ 
mental  influence  on  the  properties  of  the  system.  A  desireable  situation 
is  one  in  which  the  theoretical  states  have  the  Markov  property.  When 
a  theory  is  Markovian  in  its  postulated  states,  the  probability  of  any 
state  T^n  on  trial  n  depends  only  on  the  state  of  the  system  on  trial 
n-1  and  is  independent  of  the  sequence  of  states  that  occurred  on  trials 
1,  ... ,n-2.  That  is. 


P(T.  IT. .  . , . . . ,T.  .)  «  P(T.  IT.  ,). 

V"'  Jl*1  Vl'"'1  V"'  Vl'"'1 

The  Markov  property  represents  a  kind  of  independence  of  history. 

In  a  system  lacking  the  Markov  property,  the  future  behavior  of  the  system 
is  dependent  both  on  the  present  state  of  the  system  and  on  its  past 
behavior.  If  a  system  has  the  Markov  property  its  future  behavior  depends 
only  on  its  present  state.  This  has  far- reaching  implications  for  the 
analysis  of  the  system.  If  its  states  are  Markov,  then  any  method  that 
permits  us  to  specify  its  present  state  permits  us  to  predict  its  future 
behavior,  up  to  the  uncertainty  imposed  by  the  probability  laws  that  govern 
the  system.  If  the  states  of  a  theory  are  not  Markov,  then  predictions 
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about  future  behavior  can  be  improved  by  obtaining  information  about  states 
that  occurred  in  the  past.  If  this  is  the  case,  then  it  follows  that  the 
description  of  a  system  provided  by  the  theory  omits  important  distinctions. 
Clearly,  the  future  behavior  of  the  system  has  to  depend  on  its  present 
state,  however  it  arrived  there.  And  the  finding  that  a  theory  is  not 
Markov  in  its  postulated  states  is  clear  evidence  that  the  states  do  not 
give  a  complete  description  of  the  system. 

Of  course,  we  can  assume  that  any  description  involving  probabilistic 
relationships  among  states  is  incomplete.  If  a  complete  description  were 
available,  then  the  behavior  of  the  system  would  be  deterministic.  How¬ 
ever,  the  discovery  of  a  Markovian  structure  provides  a  basis  for  further 
investigation  that  simplifies  the  problem  of  refining  the  theoretical 
description.  If  the  best  available  theory  of  a  process  has  states  with 
the  Markov  property,  then  further  investigation  can  be  focussed  on 
distinguishing  between  relevant  subsets  of  the  class  of  events  that  are 
grouped  together  in  the  theoretical  description.  Whatever  one  finds 
regarding  the  subsets  of  such  events  can  be  treated  as  a  simple  reclassi¬ 
fication  of  the  events.  If  the  states  of  a  system  as  described  by  the 
theory  are  not  Markov,  then  variables  that  are  relevant  tothe  system's 
future  behavior  must  be  evaluated  in  relation  to  the  sequence  of  states 
in  the  history  of  the  system,  and  this  will  generally  involve  a  considerable 
cost  in  theoretical  complexity. 

I  will  illustrate  these  ideas  about  Markovian  theories  for  non- 
stationary  systems  with  a  theory  of  simple  memorizing.  The  kind  of 
experiment  to  which  the  theory  applies  involves  presentation  of  pairs  of 
items  that  are  unrelated.  The  pairs  that  a  subject  is  asked  to  memorize 
might  have  short  words  as  stimuli  and  numbers  as  responses.  On  each 
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experimental  trial,  the  experimenter  presents  a  word  and  asks  the 
subject  to  give  the  number  that  he  thinks  is  correct.  After  the 
subject  responds,  the  experimenter  presents  the  correct  answer.  On  the 
first  trial,  of  course,  the  subject  has  to  guess.  However,  after  the 
sifcject  has  seen  all  of  the  pairs  he  can  remember  the  correct 
answer  on  at  least  some  of  the  tests,  and  eventually  he  is  able  to  give 
the  correct  answer  to  all  of  the  items. 

The  data  from  an  experiment  like  this  are  analyzed  in  the  fom 
of  sequences  of  responses  given  to  the  individual  items.  For  example, 
on  one  item  a  subject  might  give  the  sequence  of  responses 

0  1  0  1  1  0  0  0  ...  , 

meaning  that  he  guessed  correctly  on  the  first  trial,  gave  an  error  on 
trial  2,  a  correct  response  on  trial  3,  errors  on  trials  4  and  5,  and 
then  correct  responses  from  then  on. 

Hie  simple  model  that  I  will  describe  was  first  developed  by 
Bower  [1].  According  to  the  model,  an  individual  item  is  learned  in 
an  all-or-none  fashion.  That  is,  at  the  beginning  of  the  experiment 
each  item  is  unlearned.  On  each  study  trial,  there  is  some  probability 
that  the  item  becomes  learned.  This  probability  is  a  const ant- -that  is, 
the  probability  of  learning  an  item  does  not  increase  over  trials  during 
the  experiment.  Once  an  item  is  learned,  the  subject  is  assumed  to  remem¬ 
ber  it  for  the  remainder  of  the  experimental  trials.  Prior  to  learning 
an  item,  the  subject  has  to  guess  the  answer  on  each  of  the  item's  tests. 

Putting  this  more  formally,  the  theory  postulates  two  states  in  which 
we  can  find  an  item— U  and  L  for  unlearned  and  learned.  Each  item 
begins  in  state  U,  and  on  each  trial  there  is  a  constant  probability  called 
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c  that  the  item  goes  from  state  U  to  state  L.  State  L  is  absorbing— 
that  is,  once  an  item  goes  into  state  L  it  stays  there.  This  set  of 
assunptions  can  be  expressed  in  standard  notation  as 


PCLj.Uj)  -  (0,1)  , 


I.  • 

n-1 


n-1 


1  0 

c  1-c 


(21) 


where  the  first  equation  states  the  assumption  that  all  the  items  start 
in  state  U,  and  the  second  equation  states  the  assumption  of  a  constant 
probability  of  a  transition  from  U  to  L  and  the  assumption  that  L  is 
an  absorbing  state. 

The  final  assumption  links  these  ideas  about  the  postulated  states 
to  probabilities  of  response.  There  are  two  possible  responses  on  each 
trial— the  subject  can  be  correct  or  wrong.  A  correct  response  is 
denoted  0  and  an  error  is  denoted  1.  While  an  item  is  in  state  U 
there  is  a  probability,  assumed  to  be  constant,  of  a  correct  response 
by  guessing.  After  the  item  goes  into  state  L  the  probability  of  a 
correct  response  is  assumed  to  be  1.  This  is  stated  in  the  following 
equation: 


L 

U 


0 _ 1_ 

1  0 

8  1-g 


(22) 


The  major  simplification  resulting  from  the  Markov  assumption  can 
be  seen  readily.  As  can  be  seen  from  equations  (19)  and  (20),  predictions 
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about  the  response  on  trial  n  based  on  previous  responses  should  use  the 
entire  response  sequence  through  trial  n-1.  However,  if  the  theory  is 
correct  then  a  prediction  about  the  postulated  state  of  the  system  on 
trial  n  based  on  previous  theoretical  states  needs  only  to  use  the  state 
of  the  system  on  trial  n-1,  and  the  sequence  of  states  before  trial 
n-1  can  be  ignored. 

The  information-theoretical  structure  of  the  theory  can  be  con¬ 
sidered  in  two  general  ways.  The  first  is  closer  to  the  earlier  discus¬ 
sion  given  for  stationary  systems.  On  each  trial,  the  sequence  of 
responses  given  for  an  item  can  be  considered  an  empirical  explanans, 
the  response  on  that  trial  an  explanandum,  and  the  postulated  theoretical 
states  mediate  between  the  two  in  the  way  considered  earlier.  For  the 
following  calculations,  I  have  taken  the  probability  of  learning,  c,  at 
.20  and  the  pre learning  guessing  probability,  g,  at  .25.  Iben,  for 
trial  2,  we  would  have 


(.25)  0 

(.75)  1 


(.20)  (.80) 


.20  .80 


.20  .80 


(23) 


for  the  relationship  between  trial- 1  responses  and  theoretical  states  on 
#rial  2.  The  relationship  between  theoretical  states  and  responses  on 
all  trials  is  given  by  equation  (22)  with  g  «  .25.  Combining  results 
given  in  equations  (23)  and  (18),  the  information-theoretical  quantities 
turn  out  to  be 

H2(S)  •  0.56,  H2(T)  -  0.50,  H2(M)  -  0.67, 

I2(S,T)  -  0.0,  I2(T,M)  -  0.22,  I2(S,M)  -  0.0. 


(24) 
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In  general,  the  probabilities  of  theoretical  states  on  trial 
n  depend  on  the  sequences  of  responses  on  all  previous  trials.  On 
trial  3, 


(.36)  (.64) 


(.10) 

00 

.60 

.40 

(.15) 

01 

.20 

.80 

*qii  3J  " 

(.30) 

10 

.60 

.40 

(.45) 

11 

.20 

.80 

H3(S)  -  1.24,  H3 

(T)  *  0.65 

,  H3(M)  -  0.69 

I3(S,T)  -  0.08, 

I3(T,M)  - 

0.33,  I3(S,M) 

0.04. 


(25) 


A  similar  situation  exists  for  trial  4,  with  the  explanans  consisting 
of  the  eight  values  given  in  equation  (20) .  The  information-theoretical 
quantities  are 


H4(S)  -  1.88,  H4(T)  ■  0.69,  H4(M)  -  0.67, 

I4(S,T)  ■  0.19,  I4(T,M)  -  0.38,  I4(S,M)  »  0.11. 


(26) 


An  overall  impression  of  the  informational  structure  of  the  theory 
may  be  obtained  by  examining  the  sums  of  information  transmitted  across 
trials.  For  the  parameter  values  used  above,  the  sums  for  trials  2-13 
are 

13  13  13 

Z  I • (S,T)  -  2.61,  Z  I . (T,M)  -  3.47,  Z  I.(S,M)  -  1.74. 
j-2  3  j-2  J  j-2  J 

Eventually,  these  sums  converge  but  the  values  for  c  ■  .20  do  not  con¬ 
verge  within  the  trials  for  which  I  carried  out  calculations.  (Hie  cal¬ 
culations  were  not  carried  further  because  an  inordinate  amount  of  time 
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is  needed  for  computation  of  1^  when  j  becomes  large.)  However,  to 
provide  some  indication  of  the  behavior  of  the  statistics,  I  carried 
out  calculations  for  higher  values  of  c  until  the  sums  did  approach 
asymptotic  values.  The  results  of  these  calculations  are  in  Table  1. 
The  main  findings  of  interest  in  these  calculations  are  the  strong 


Table  1 

Information-Theoretical  Statistics  for 
All-or-None  Learning 


00  00  00 


Parameters 

l  I.(S,T) 
j-1  3 

j-1  j 

l  I.(S 

j-i  y 

c».40,  g-.25 

0.93 

1.84 

0.66 

c-,40,  g«.50 

0.64 

1.09 

0.29 

C-.60,  g-.25 

0.29 

0.97 

0.21 

C-.60,  g-.50 

0.19 

0.58 

0.09 

dependence  of  the  amount  of  information  transmitted  on  the  parameter 
values,  and  the  further  illustrations  of  the  fact  that  the  dependence 
between  empirical  variables  is  always  less  strong  than  the  dependence 
between  either  set  of  empirical  variables  and  the  theoretical  variables. 

The  preceding  calculations  all  have  to  do  with  the  theoretical  states 
considered  as  mediators  between  preceding  response  sequences  and  the 
response  on  trial  n.  Another  view  of  the  situation  can  be  obtained  by 
examining  the  sequence  of  theoretical  states,  without  regard  for  the 
observable  responses.  This  latter  point  of  view  is  concerned  with  un¬ 
certainty  and  information  transmitted  at  the  level  of  the  states  that 
are  postulated  in  the  theory,  and  in  some  ways  gives  a  more  direct  eval- 
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uation  of  our  state  of  knowledge  than  the  analysis  that  deals  with 
responses,  assuming  the  decision  that  the  theory  represents  the  best 
available  understanding  of  the  system  in  question. 

The  analyses  of  theoretical  sequences  are  similar  to  those  given 
earlier  in  connection  with  equations  (18) -(20),  except  that  the  Markovian 
structure  of  the  theory  permits  us  to  ignore  states  of  the  system  occur¬ 
ring  in  the  past.  For  any  trial  n,  the  probability  of  any  state  depends 
only  on  the  state  on  trial  n-1;  in  fact,  the  probabilities  of  state-to- 
state  transitions  are  constant.  For  trial  2,  we  have 


(0.0)  L. 

1*4  4  I' 

J1J2  (1.0)  Uj 


(.20)  (.80) 


1  0 
.20  .80 


(27) 


Hj(T)  •  0.0,  H2(T)  •  0.50,  H12(TxT)  »  0.50,  I2(T,T)  «  0.0. 


For  trial  3,  the  probabilities  given  at  the  top  of  equation  (27)  are  the 
probabilities  of  the  explanans,  and  the  transition  probabilities  remain 
unchanged.  Hie  state  probabilities  on  trial  3  are  P(L3)  ■  .36,  P(U3)  * 

.64.  Hie  information-theoretical  quantities  are 


H3(T)  «  0.65,  H23(TxT)  .  1.08,  I3(T,T)  «  0.07. 

On  trial  4,  the  state  probabilities  are  P(L^)  ■  .49,  P(U4)  ■  .51, 

leading  to 

H4(T)  -0.69,  H34(T*T)  ■  0.97,  I4(T,T)  »  0.37. 


Aside  from  the  marked  increase  in  simplicity,  compared  with  the 
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observable  sequences,  the  theoretical  sequences  also  have  more  informa¬ 
tion  transmitted.  Por  the  trials  given  above,  the  calculations  given 
earlier  for  I(S,M)  are  0.0,  0.04,  and  0.17.  The  fact  that  values  of 
information  transmitted  are  higher  in  the  theoretical  sequences  than  in 
the  observable  sequences  is  not  an  accident.  Any  matrix  of  probabilities 
in  the  observable  responses  like  those  given  in  equations  (18) -(20)  can 
be  derived  as  the  products  of  three  matrices: 


P(Sn-l-T„-l>P<T„-l'VP<VV  • 


The  matrix  designated  by  the  central  term  is  the  transition  matrix  for 
the  theoretical  states,  and  we  have  already  seen  that  the  information 
transmitted  by  a  product  is  no  greater  than  the  information  transmitted 
by  any  of  the  matrices  multiplied  to  form  the  product. 

Both  the  simplicity  and  information-theoretical  advantages  of 
the  theoretical  structure  are  illustrated  further  by  the  analysis  of 
information  transmitted  summed  over  trials.  Ihe  analysis  of  this 
statistic  for  sequences  of  observable  responses  was  discuaaeo  earlier; 
each  of  the  calculations  presented  there  required  nearly  an  hour  of 
calculation  on  a  medium-small  computer  (an  IBM  1800  with  16,000  words  of 
core  storage  and  4  microsecond  memory  access) .  The  calculations  for  the 
theoretical  sequences  can  be  done  quite  easily  by  hand.  In  general. 


VT'T>  ■  Hn-l'T>  *  H„m  ’  Hn-l,n'T-T>- 


where 


Hn(T)  -  -P(Ln)logP(Ln)  -  P(Un)logP(Un) 


[ l-(l-c)11"1  J log[  l-(l-c)11"1]  -  (l-c)n*1log(l-c)n'1  ’ 
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Hn-l,n™  "  -p(Ln-Pl0«PCLn.l>  “  P<Vl*  Vlo*P<Un-l'Un> 

•  ^Vl'V^n-l'V 

-  -[ 1- (l-c)n*2l log[ l-(l~c)n"2J  -  (l-c)n‘1log(l-c)n*1 

-  c(l-c)n'2logc(l-c)n'2  . 

Combining  tens  and  summing  across  values  of  n, 

E  I.(T,T)  -  log  c  -  log  (1-c)  -  I  [l-(l-c)k]log[l-(l-c)k] 

i"2  J  c  k«l 

For  values  of  c  of  .20,  .40,  and  .60  the  sues  El  for  the  theoretical 

n 

sequences  are  S.67,  3.99,  and  1.16.  The  last  two  values  can  be  compared 
with  values  given  in  Table  1  for  the  observable  sequences. 

Since  there  is  a  simpler  structure  and  greater  infomation  transmitted 
in  the  theoretical  sequence  than  there  is  in  the  sequence  of  observations, 
the  theory  provides  an  advantageous  basis  for  developing  new  knowledge. 

The  development  of  new  measurement  techniques  for  increasing  dependencies 
between  observations  and  the  theoretical  states,  and  the  refinement  of 
theory  to  provide  increased  infomation  transmitted  in  the  trial-to-trial 
transitions  both  constitute  additions  to  knowledge  that  are  frequent  in 
scientific  investigation,  and  that  are  explicable  within  the  framework 
of  the  present  analysis. 


Conclusions 

In  presenting  the  information-theoretical  analysis  of  theoretical 
entities  of  this  paper,  a  nuaber  of  rather  routine  calculations  have  been 
carried  out.  To  some  extent,  the  significance  of  these  analyses  is  just 
that  they  can  be  carried  out.  The  analyses  of  this  paper  serve  as  an 
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existence  proof  that  it  is  possible  to  perform  analyses  that  are  rele* 
vant  to  evaluating  our  state  of  knowledge  when  we  have  statistical 
knowledge  about  a  system.  It  is  universally  recognized  that  theoretical 
entities  can  serve  as  an  heuristic  aid  in  development  of  new  empirical 
knowledge  about  a  system,  and  in  some  cases  can  provide  a  substantial 
simplification  in  the  representation  of  knowledge.  These  facts  are 
clarified  and  specified  to  some  extent  by  the  present  analysis.  We 
have  seen  that  the  information  transmitted  by  dependencies  between  a  set 
of  empirical  variables  and  a  set  of  theoretical  variables  is  generally 
greater  than  the  infoxmation  transmitted  between  sets  of  empirical 
variables,  and  this  fact  clarifies  the  usefulness  of  postulated  entities 
in  guiding  empirical  investigations.  The  role  of  theoretical  entities 
in  relation  to  simplicity  is  especially  striking  in  nonstationary 
systems,  particularly  when  the  theoretical  system  has  the  Markov  property. 
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